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Abstract 



We investigate the existence of bounded- memory consistent esti- 
! mators of various statistical functionals. This question is resolved in 

■ the negative in a rather strong sense. We propose various bounded- 

memory approximations, using techniques from automata theory and 
& • stochastic processes. Some questions of potential interest are raised 

c/3 , for future work. 

1 Introduction 

It is well-known that the empirical average of independent, identically dis- 
tributed (iid) random variables rapidly converges to their expectation. For 
concreteness, suppose that Xi, % = 1, . . . ,n are iid rando m variables with 
mean 9 taking values in [0,1]. Then Hoeffding's bound ijHoeffdineL fl~963h 
states that for all e > 0, we have 

O 



'l-y^ Xi> e + e\ < exp(-2ne 2 ), 



(1) 



with an analogous bound for the left tail. The inequality ([T]) guarantees the 
convergence of A : (Xi, . . . , X n ) i— > n~ l ^ Xi to 9 in probability, implying 
that A is a consistent estimator for 9. 

In this note we address the following question: can statistical estimation 
be performed with bounded memory? The answer, of course, depends on the 
particular memory model we have in mind. A vast literature has dealt with 
analyzing the space complexity of various computations on data streams 
under different notions of memory (see the discussion in Section [3]). 
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For the purpose of this paper, memory will be measured in the number 
of bits stored (or equivalently, in the number of tape squares written on by 
a Turing machine). This stringent definition precludes processing even a 
single infinite-precision real number, so without loss of generality we assume 
henceforth that the are {0, l}-valued. Assuming, as we do for now, that 
the Xi are iid, we have that X is a Bernoulli process, parametrized entirely 
by its mean 9 = BX { = P{X; = 1} = 1 - P{Xi = 0}. 

The empirical average in dl|) may be computed naively by summing the 
n bits Xi, . . . , X n and dividing by n. A standard trick circumvents storing 
the entire bit sequence, by initializing A\ := X\ and updating 

nA n + X n+1 

A ^ ■= n + l • W 

This trick is infeasible under our model of memory since storing the integer 
n requires f2(logn) bits. There remains, however, the possibility that some 
other scheme performs consistent estimation in bounded memory. Thus, 
rather than analyzing the behavior of a particular function, such as the em- 
pirical mean, on the data stream, we ask whether any function computable 
in bounded memory can be a consistent estimator of some distribution pa- 
rameter 

A consistent estimator of the Bernoulli parameter 9 is a function A : 
{0, 1}* —f M such that A{X\, . . . , X n ) converges in probability to 9: 

lim P{\A(X 1 ,...,X n )-9\ >e} = (3) 

n— too 

for all e > 0. An obvious obstacle to achieving ([3]) is again the issue of pre- 
cision: if 9 takes values in some infinite set O, then clearly (J3j) is impossible 
for any function A computable with O(l) bits of memory (as the latter can 
only distinguish among finitely many possibilities). 

The main result of this paper is that even when the precision obstacle is 
removed, statistical estimation with bounded memory remains impossible. 
In particular, we prove 

Theorem 1.1. Suppose that Xi G {0,1} are Bernoulli random variables 
with parameter 9 taking values in a fixed finite set which contains distinct 
9q,9\ S (0,1). Then there is no consistent estimator for 9 computable using 
f(Q) bits of memory, for any function f : Q i— > N. 

1 This question w as posed to us by Ronen Brafma n, motivated by problems in Rein- 
forcement Learning (jBrafman and Tennenholt3 . 1200 j ). 
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This claim follows directly from a much more general result on the im- 
possibility of estimating any nontrivial statistical functional of a Bernoulli 
process in bounded memory, proved in Theorem 15.11 In Theorem 16.11 this 
result is generalized further to a much broader class of random processes, 
including stationary ones with full support. 

We also investigate a partial converse to Theorem II. 1\ where we con- 
struct e-consistent estimators as DFAs with 0(log(l/e)) states (Theorem 
18. ip . This initiates the study of approximate statistical estimation by finite- 
state automata. 



2 Outline of paper 

This paper is organized as follows. We briefly review streaming algorithms 
and their connection to regular approximations in Section [3] and set down 
the terminology used throughout the paper in Section SJ The main negative 
results are proved in Section [6] for the Bernoulli case and generalized to all 
stationary processes with full support in Section [5j Counterexamples to the 
main theorem are given in Section[T]for some pathological random processes. 
In Section [8] we give some results on approximate statistical estimation with 
DFAs. Finally, we give a brief recap and suggest some future directions in 
Section [9l 



3 Background and related work 



The problem of efficiently (in time and space) extracting relevant infor- 
mation from a long sequence of data goes under the general heading of 
streaming algorith ms. It appear s tha t the earliest results along these lines 
were the papers of Morris! ( 19781 ) and Flajolet and Martin (1985), in which 
(sub)logarithmic space was shown to suffice for approximating, respectively, 
the count and the number of unique items in a stream of length n. The field 
sa w a surge of activi t y starting from the 1990 's, in cluding the seminal paper s 
of lAlon et all (j 19991 1 , iHenzinger etaD rtl999h. and iFeieenbaum et all (|2002h , 
among many others. See lGuha and M cGregor (2009) for a recent result con- 
cerning quantile estimation and iMuthukrishnanl (|2005l ) for a comprehensive 
survey of the subject. 

A different line of research i nvestigates approximations of non-regular 

langu ages by finite automata; see lEisman and Ravikumar (2005), Cqrdv and Salomaa 



( 20071 ) and the references therein. Eisman and Ravikumar ( 2005 ) appar 



ently made the first explicit connection between streaming algorithms and 
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regular approximations. Indeed, since any bounded-memory algorithm can 
be implemented as a finite automaton, Theorem 11,11 may be recast as the 
following claim: if A : {0, 1}* — > O is a consistent estimator for the Bernoulli 
parameter 6 € 0, then the language L t C {0, 1}* given by 

L t = A- X ({t}) = {x e {0, 1}* : A(x) = t} 

is not regular for < t < 1. 

Theorem 11.11 is also equivalent to the statement that any consistent es- 
timator for l{0 >a |, a € (0, 1) cannot have a regular supp ort set. For a = ^, 



this la st formulation is deceptively similar to Theorem 5 of lEisman and Ravikumar 



(2005), which states, roughly, that no regular language can approximate the 



majority language on a set of uniform measure more than one-half. How- 
ever, as we show in Section this is not true for biased Bernoulli measures. 
Nevertheless, the main ingredient in our proofs — Markov chain analy- 
sis on the states of the DFA — was largely inspired by the technique of 
Eisman and Ravikumar ( 20051 ). 



4 Notation 



We follow the standard conventions for sets, languages, probability and au- 
tomata. Thus, {0, 1}* is the set of all finite bit strings and a language is any 
L C {0, 1}*. String length is denoted by \x\ and the notation 

{0, lp := Ui {0, 1}' = {xe {0, 1}* : \x\ < k} 

is used. For any / : {0, 1}* — > {0, 1}, its support set is 

supp(/) = rH{l}) = {x€ {0, 1}* : f(x) = 1} . 

A Deterministic Finite-state Automaton (DFA) over the alphabet {0, 1} is 
defined as the tuple A = (Q, qo, F, 5) where 

• Q = {1, 2, . . . , n} is a finite set of states 

• Qo ^ Q is the starting state 

• F C Q is the set of the accepting states 

• S : Q x {0, 1} — > Q is the deterministic transition function 
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(among the s tanda r d intr oductory texts on automata are lLewis and Papadimitriou 
( 198ll ) and Sipser ( 2005 )). The transition function 5 may be extended to 
Q x {0, 1}* via the recursion 



5{q, («i,« 2) • • -,u n )) = 5{5(q, (u x ,u 2 ,.. . ,n n _!)),u n ). 

We regularly blur the distinction between an automaton and its under- 
lying (multi)graph, whose vertex set is Q and whose edges are induced by 
5. We also abuse the notation slightly by identifying languages L C {0, 1}* 
and automata A with their characteristic functions L,A : {0, 1}* — > {0, 1}, 
defined by L(x) = l{ xeL ] (resp., A(x) = l{s( qo , x )eF})- The probability P{-} 
is always defined with respect to the random process X specified in context, 
and we often use the shorthand X n = (X\,X2, ■ ■ ■ ,X n ). We say that the 
Bernoulli process X has parameter 9 if the Xi are iid with 

p{x, = 1} = e = i - p{Xi = o} . 
5 The Bernoulli case 

We begin with the problem of estimating the Bernoulli parameter 9. Al- 
though the result in this section is subsumed by the more general (and ar- 
guably simpler) Theorem 16.11 we present Theorem 15.11 here for expositional 
clarity. 

In this section, a statistical functional T : [0, 1] — > {0, 1} is any binary 
map acting on the Bernoulli parameter (e.g., T(9) = l{e>i/2})- We say that 
T is nontrivial if it is not identically or 1. As before, a consistent estimator 
for T is any A : {0, 1}* — > {0, 1} such that A(X) converges in probability to 
T{9): 

lim P{A{X n ) ^ T{9)} = 0. (4) 

n— >oo 

We prove that finite automata cannot be consistent estimators, which in 
particular implies Theorem ll.il 

Theorem 5.1. Suppose X is a Bernoulli process with parameter 9 £ (0, 1) 
and T is a nontrivial statistical functional. If A is a consistent estimator 
for T, then its support language, La = supp(^4), is not regular. 

Remark: this theorem was proved together with Gerald Eisman. 

Proof. Suppose to the contrary that the language La Q {0, 1}* is regular. 
Then it is recognized by some DFA A = (Q,5,qo,F). We may take A to be 
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a minimal DFA, and in particular, every state is reachable from the starting 
state go : 



VqeQ3ue{0,l}* :q = 5(q ,u). (5) 

The Bernoulli process X together with the transition function 5 de- 
fine a Markovian dynamics on Q as follows. Defining the random variable 
£ n G Q to be the state of the automaton after reading the random string 
(Xi, ... , X n ), we have P{£ = <?o} = 1 and 

( 0, %,!)=<?' 
P{£n+1=<7' I = < 1-0, S(q,0)=q' . 

[ 0, otherwise 

Since T is nontrivial, there are 9q,9\ G (0,1) such that T(9q) = and 
T(8i) = 1. Thus the property in Q may be restated as 

lim P{£„ <E F} = T(0), 6 G {0 O , ^l} , (6) 

n— *oo 

which says that the probability that the DFA A is in an accepting state 
approaches either or 1, depending on whether 6 = 9q or 9 = 9\. 

Since Q is finite, th e Mar kov chain has at least one ergodic component 



(see Kemenv and Snell ( 19761 ) for general facts about finite Markov chains). 



Furthermore, with probability 1, the chain enters an ergodic component 
after finitely many steps. 

Let u> = (u>i,U2, . . .) G {0, 1} N be a particular realization of the Bernoulli 
process and let £(cj) = (£i, ^> • • •) be the induced sequence of Markov states. 
Elementary theory of finite Markov chains implies the following: 

(i) there is an ergodic component E C Q such that 

P{£ n G E for all but finitely many n G N} = 1 

(ii) for each q G E there is a ir q > such that 

P V : = q} = 7T q (7) 
for infinitely many n G N. 

Our final observation is that any ergodic component must contain both 
accepting and non-accepting states: 

^EflFCE. (8) 



6 



This holds because the ergodic components depend only on the connective 
properties of the Markov chain and not on the actual value of the Bernoulli 
parameter 9 — as long as it is nontrivial (i.e., not or 1). If there were a 
"homogeneous" ergodic component E, in the sense that E C F or E(~)F = 0, 
condition (0) would be violated since for all 9 G (0, 1), the Markov chain 
becomes trapped in E with positive probability, as per ©. 

But the conjunction of (J7|) and (J8]) contradicts ©, since the latter re- 
quires that P{£n G S} — > for S = E (1 F or S = E \ F. Thus, we conclude 
that no DFA A satisfying exists. □ 



6 Stationary processes with full support 

We refer the reader to Kallenbere (2002) for the relevant background on 
random processes. The {0, l}-valued process X = (Xi, X%, . . .) is said to be 
stationary if 

F {( x t!,Xt 2 , ■ ■ ■ ,X tm ) = x} = P{(X tl+k ,X t2+k , . . . ,X tm+k ) = x} 

for all k, m > 1, all < t\ < . . . < t k , and all x £ {0, l} m . We say that X 
has full support if every realization occurs with positive probability: 

P{(X 1 ,...,X n )=x}>0 (9) 

for all n > 1 and all x G {0, l} n . 

The extension of Theorem 15.11 to the much broader class of stationary 
processes with full support is quite straightforward, requiring only minor 
additional abstraction. Thus, if M is any collection of discrete-time, {0, 1}- 
valued random processes, then each process is determined by some measure 
/i on {0, 1} N . A statistical functional is any mapping T : A4 — > {0, 1}, and 
T is nontrivial if there are i^o^i €E M such that T(po) ^ T(fii). Finally, 
A : {0, 1}* — > {0, 1} is a consistent estimator for T if 

lim fi {x G {0, l} n : A(x) ^ T(p)} = (10) 

n^oo 

for all jj, G M; a shorthand way of writing the above is P{A(X n ) ^ T(/i)} — > 
0. 

We are ready to proceed with the generalization: 

Theorem 6.1. Let M be a collection of stationary {0,1} -valued processes 
with full support and suppose T : M. — > {0, 1} is a nontrivial functional. 
Then a consistent estimator for T cannot have regular support. 



7 



Proof. Assume to the contrary that T has an estimator with a regular 
support set, and that the latter is recognized by the minimal DFA A = 
(Q,S,qo, F). Let uj = (u±, u>2, ■ ■ •) € {0, 1} N be a particular realization of 
the random process X and let = (£1, £2, • • •) be the induced sequence of 
states £j G Q traversed by the automaton when reading X. We claim that 
there is a strongly connected component E C Q such that 

(i) P{£n € -E 1 for all but finitely many n € N} = 1 

(ii) there is a ttq > such that for all q £ E we have P{£ n = q} > ttq for 
infinitely many n E N. 

Part (i) follows by elementary graph theory since any directed graph decom- 
poses into transient and strongly connected (SC) components, and E C Q 
is simply the SC component in which £ n eventually becomes trapped. The 
process will not become trapped in a transient component H because by the 
full support property Q, there is a positive escape probability from H and 
by stationarity, this escape probability cannot decrease with n. To prove 
(ii), let y £ Q m be any path that traverses all the elements of E (which must 
exist since E is SC) and let x € {0, l} m be the corresponding underlying bit 
sequence. By stationarity, ttq = P{X™ +m = x} is a well-defined quantity 
independent of n, and by ([9]) it must be positive. 
Finally, we argue that 

^ EOF C E, 

since for all [i € A4, the process £ n becomes trapped in the SC component 
E with positive probability and in order for A to satisfy (|10j) . E 1 must con- 
tain both accepting and non-accepting states. The remainder of the proof 
proceeds analogously to that of Theorem 15.11 □ 

7 Counterexamples 

Theorem 16.11 gives sufficient conditions for a random process not to admit 
consistent finite-state statistical estimators. In this section, we give exam- 
ples of such estimators for processes violating the conditions of stationarity 
and full-support. The basic intuition is that a DFA cannot accumulate sta- 
tistical information; it can only be driven into a certain state by the process 
(as in Section ITT]) . or exploit forbidden patterns (as in Section [712]) . 
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Figure 1: A two-state automaton distinguishes 0-dominant processes from 
1-dominant ones. 

7.1 Non-stationary process 

Let M be the collection of {0, l}-valued processes X = (Ai, A~2, ■ ■ .) where 
for each sample path we have 

lim P{X n = 0} G {0, 1} . 

n— >oo 

In other words, any realization of X eventually becomes dominated entirely 
by 0s or Is. Processes of this type are clearly not stationary, and a simple 
automaton (Figure [T]) distinguishes the 0-dominant processes from the 1- 
dominant ones. It is not difficult to verify that this automaton will occupy 
state qj with probability approaching 1 when reading the realization of a 
j-dominant process, for j € {0, 1}. 

7.2 Process without full support 

Let M. be the collection of {0, l}-valued iid Bernoulli processes X = (X%, X2, ■ ■ ■ ) 
with parameter 9 € [0, 1]. We call the processes with 9 £ {0, 1} degenerate 
and those with 9 € (0, 1) nondegenerate. The processes comprising M are 
stationary but do not all have full support, and a simple automaton (Figure 
|2j) distinguishes degenerate processes from nondegenerate ones. It is easily 
verified that this automaton will occupy state qo\ with probability approach- 
ing 1 when reading a nondegenerate process and will become trapped either 
in state qo or q\ when reading a degenerate process. 

8 Approximate statistics with a DFA 

We revisit the problem of approximating the Bernoulli parameter 9 with 
a DFA. As discussed in Section [TJ this question is only meaningful if 9 
is allowed to take values in some finite set 0. Suppose for concreteness 
that = {0 < 9q < 9i < ... < 9^ < 1}. Then the problem of determining 
whether 9 = 9j G @ is reduced to deciding whether 9 > 9j-\ and 9 < 9j + \. 
Of course, by Theorem l5-H a consistent estimator for T a (9) = l{g >a } cannot 
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Figure 2: A four-state automaton distinguishes degenerate processes from 
nondegenerate ones. 



be realized by any DFA. Consider, however, relaxing the requirement of 
consistency in (j4]) to e-consistency: 

limsupP{A(X") ^ T(9)} < e. (11) 

n—*oo 

We shall examine case of T]y 2 in some detail. To this end, recall the majority 
function MAJ : {0, 1}* -> {0, 1}, defined by 



MAJ(x) = l {Eli ^ >¥} . 



We observe that any consistent estimator of 1{6»>i/2} must asymptotically 
agree with MAJ: 

Theorem 8.1. Let X be a Bernoulli process with parameter 6 / | and 
suppose that A : {0, 1}* — > {0, 1} is a consistent estimator of the functional 
T x/2 : 6 i ^ 1 {e>l / 2 }. Then 

lim ¥{A{X n ) + MAJ(X n )} = 0. 

n— >oo 

Proof. Assume without loss of generality that 6 > ^, so T 1/ / 2 (^) = 1- Then, 
P{MAJ(X")/1} = Pjl>^i} 



< exp(-2n(0- i) 2 ), (12) 
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Figure 3: The automaton M(5) agrees with MA J on all x € {0, 1}- . The 
general M(k) = (Q, qo, F, 5) is constructed as follows: Q = {1, . . . , k}, qo = 
|4|1J, F = {\k/2\ +l,...,fe} and 6(i,0) = i- l + l {i=1} , 5(i,l) = i + l- 
l{i=fe} for 1 < i < k. 



where the last inequality is Hoeffding's; a similar analysis holds for 9 < i. 
Thus, 

P{A{X n ) / MAJ(X n )} < 

P{A(X n ) + T 1/2 (0)} + P{MAJ(X") + T 1/2 {6)} -» 

where the first term goes to because A is a consistent estimator for TW2 
and the second term vanishes by (|12p , □ 



Let M{k) be the minimal DFAd which agrees with MAJ on all binary 
strings of length less than k; these are illustrated in Figure [3j One might 
inquire how well M(k) approximates l{e>i/2} 011 l° n g Bernoulli sequences, 
and the following theorem provides an answer: 

Theorem 8.2. Let X be a Bernoulli process with parameter 6 let M{k) be 
defined as above. Then 



7] = Jim P{X n <£ M(k)} 

:/2j+l 

(13) 



n— >oo 

-1 _ 1 _ (Q- 1 - ]U*/2J+1 



e- 1 - 1 - - i) fc+1 ' 

For k even and 1/2 < < 1, we /lave 

77 < i(2-20) fc / 2 . (14) 
Remark: we thank Daniel Dadush for help with this calculation. 



2 One may take the family of automata constructed in Figure [3] as the definition of 
M(k) and prove as a simple exercise that this is indeed the smallest DFA agreeing with 
MAJ on all of {0, !} <fc - For small values of k, the techniques of lTrakhtenbrot and Barzdin'l 
l|l973l ) or lAngluinl (|l987f ) may be used to construct the minimal DFA agreeing with a given 
membership oracle on {0, !} <fe . 
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Proof. The Bernoulli process X induces the Markov chain £ = 

£i £ Q, on the states of M(k) as described in the proof of Theorem 15.11 By 

construction of the DFA M(k), the induced Markov chain is ergodic (for 

a visual illustration, relabel every "1" edge in Figure [3] with 9 and every 

"0" edge with 1 — 6). Its unique stationary distribution tt € R fc has the 

interpretation 

ir q = lim P{£ n = q} 

n—>oo 

and obeys the recurrence 

TTi = 6>7Ti_i +1{i=1} + (1 - #)7r m _ 1{t=fe} , 1 < i < k. 

This relation is satisfied by the vector 

*?< = - 0)*"*, 1 < i < fe, 

which must be normalized to make it into a probability distribution. The 
accepting states F C Q = {1, . . . , k} of M(/c) are all g > [/c/2] , and so the 
limiting probability of being in a rejecting state is given by 

£■=1^(1-0)*"* 

(i-«0*£ti (r^) J 

and the latter sum up as geometric series to yield (|13f) . From there, obtaining 
(I14p is a matter of simple calculus. □ 

It follows from Theorem 18.21 that the DFA M(2k) will disagree with the 
majority function on long runs of Bernoulli processes with parameter 6 with 
probability at most 

R(e,2k) = i fi-2 

Note that 

lim R(6, k) = (15) 
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1 

Figure 4: The automaton Af 1/ / 3 (7) for approximately estimating XW3. 




Bernoulli parameter 9 



Figure 5: The limiting behavior of M 1/ / 3 (7) and M 1 / 3 (24), whose sizes are 9 
and 29, respect ively. The automata were obtained by Angluin's algorithm 
(|Angluinl . [l987h . 
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exponentially fast, while 



lim R(0,k) = -. (16) 
>-i/2 V ' 2 v ; 



This approach can be generalized to obtain approximate finite-state esti- 
mators for the Bernoulli statistical functional T a (6) = l{0>a} f° r a € (0, 1). 
For k € N and a € (0, 1), define M a (k) be the (unique) smallest DFA which 
agrees on all x G {0, l} <fc with the function MAJ a : {0, 1}* -> {0, 1}, defined 
by 



MAJ a (x) = l {Eli ^ >aN} 



(M 1 / 3 (7) is illustrated in Figured]). We can associate to each M a {k) an 
ergodic Markov chain with a unique stationary distribution, as done in the 
proof of Theorem 15.11 Thus, each M a {k) has a well-defined limiting accep- 
tance probability 

Pa (k,0)= lim P{X n GM a (k)} 

n— >oo 

as well as a limiting probability of error 

R a (k,6) = lim P{M a (k)(X n )^T a (9)} 

n—>oo 

(the curves of Px/s(7,-) and /9 1 / 3 (24, •) are plotted in Figure [5]). It is not 
difficult to show, using arguments analogous to those in Theorem l8.lt that 



lim R a ( 



0, 9^a 



1 



This is a natural generalization of (|15p and (|16|) for o / we leave the 

analysis of the convergence rates for future work. 

Contrast these results with a theorem of Eisman and Ravikumar ( 20051 ). 
which may be stated as follows. 

Theorem 8.3 ( Eisman and Ravikumar ( 20051 )). Let X be a Bernoulli pro- 
cess with parameter 8 = \ and suppose that L C {0, 1}* is a regular language. 
Then 

hmsupP{L(X") + MAJ 1/2 (X n )} > \. 
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The moral is that for a given Bernoulli process with parameter 9, the 
majority function can be e-approximated (in the sense of (jlip ) by a DFA 
with 0(log(l/e)) states, and the approximation gets progressively worse as 
9 approaches |. Thus, (fT5j) and ([To]) provide a converse of sorts to Theorem 
8.31 which eliminates the possibility of a better than i approximation to 
MAJ b y any DFA under the unbia sed Bernoulli process. 

See Cordy and Salomaa ( 20071 ) for other results on approximating non- 
regular languages by DFAs. 



9 Discussion and future work 

We have shown that consistent statistical estimation is not realizable by 
finite-state automata, but if the consistency requirement is relaxed, efficient 
e- approximations exist. The negative result holds for the broad class of 
stationary processes with full support. 

Along the way, we encountered several insights. In Section [7J we saw 
that although a DFA cannot accumulate statistical information, it can ex- 
ploit a time drift or forbidden patterns in the random process. It would be 
interesting to make this intuition more rigorous — for example, by giving a 
full characterization of the random processes that do not admit consistent 
finite-state estimators of nontrivial statistical functionals. 

The observations in Section [8] raise a number of interesting questions. We 
conjecture the family of DFAs we constructed to approximate the majority 
function in Theorem 18.21 is optimal in the sense that any finite-state e- 
approximation (see (fTTj) ) to MAJ must use 0(log(l/e)) states. 

A general method for approximating the Bernoulli parameter 9 is sug- 
gested in Section El use M a (k) and M^k) with large k to pinpoint 9, with 
high probability, to the interval (a,b). It would be interesting to analyze 
the asymptotic error R a (k,9) and the size of M a {k) as functions of (k,a,9) 
and perhaps establish an optimality property of some sort for this class of 
estimators. 
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